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Abstract 

Causal inference uses observations to infer the causal structure of the data generating system. 
We study a class of functional models that we call Time Series Models with Independent Noise 
(TiMINo). These models require independent residual time series, whereas traditional methods like 
Granger causality exploit the variance of residuals. There are two main contributions: (1) Theoretical: 
By restricting the model class (e.g. to additive noise) we can provide a more general identifiability 
result than existing ones. This result incorporates lagged and instantaneous effects that can be 
nonlinear and do not need to be faithful, and non-instantaneous feedbacks between the time series. 
(2) Practical: If there are no feedback loops between time series, we propose an algorithm based 
on non-linear independence tests of time series. When the data are causally insufficient, or the data 
generating process does not satisfy the model assumptions, this algorithm may still give partial results, 
but mostly avoids incorrect answers. An extension to (non-instantaneous) feedbacks is possible, but 
not discussed. It outperforms existing methods on artificial and real data. Code can be provided 
upon request. 



1 Introduction 

We consider finitely many time series X%,i € V, with a maximal time order of p, that is we assume no 
influence from X\_ k on X{ for k > p. We further assume stationarity: the influence from X\_ k on X\ is 
required to be the same for all t. The question whether X 1 is causing X J now reads as whether there is 
a causal influence from some X\_ k on X\ , for < k < p. All models assume homoscedastic noise. 

We first review causal inference on iid data, that is in the case with no time structure, in Section [2j 
Note that iid methods cannot be applied directly on time series data because a common history might 
introduce complicated dependencies between contemporaneous data X t and Y t . Motivated by the iid 
case, |Chu and Glymour 2008 and Hyvarinen et al. 2008 propose approaches for the time series setting 



that include linear instantaneous effects. We describe these methods together with Granger causality in 
Section [3] All of them encounter similar problems: none of them are general enough to include nonlinear 
instantaneous effects or hidden common causes. Furthermore, when the model assumptions are violated 
the methods give incorrect results and one draws false causal conclusions without noticing. We propose to 
use time series models with independent noise ( TiMINo) that include nonlinear and instantaneous effects. 
The model is based on Functional Models (also known as Structural Equation Models) and assumes X t to 
be a function of all direct causes and some noise variable, the collection of which is supposed to be jointly 
independent. This constitutes a relatively straight-forward extension on iid methods, but we regard the 
benefits in the setting of time series as substantial: In Section [4] we prove that for TiMINo models the 
full causal structure can be recovered from the distribution. Section [5] introduces an algorithm ( TiMINo 
causality) that recovers the model structure from a finite sample. It covers a broader class of models than 
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existing methods and can be run with any provided algorithm for fitting time series. If the data do not 
satisfy the assumptions, TiMINo causality remains mostly (see Section 5.3| undecided instead of drawing 
wrong causal conclusions. The methods are applied to simulated and real data sets in Section [6] 



2 Causal inference on iid data 

Inferring causal relations from observational data is challenging when interventions are not applicable. 
Given iid samples from p( x ')^ v ; we try to find the underlying causal structure of the variables X 1 , i € V. 



2.1 Directed acyclic graphs and constraint-based methods 

Let X % ,i £ V be a set of random variables and Q a directed acyclic graph (DAG) on V. The joint 
distribution is said to be Markov with respect to the DAG Q if each variable is independent of its non- 
descendants given its parents. The distribution is faithful with respect to Q if all conditional independences 
are entailed by the Markov assumption. Constraint-based methods [e.g. Spirtes et al. 2000 assume that 



the joint distribution is Markov, and faithful with respect to the true causal DAG. They show how to 
exploit conditional independences for reconstructing the graph Q, e.g. using the PC algorithm; but the 
graph can only be recovered up to Markov equivalence classes. E.g., X — > Y and Y — > X cannot be 
distinguished. 



2.2 Functional models and additive noise 



Functional models Pearl 2009 provide a different approach to the problem described above: We say 
p(X ),i&v sa ti s fies a functional model if for all i € V there exists a set of nodes PA 1 C X v ^ l \ a function 
fi and a noise variable N z , such that we can write X' 1 = f i (PA\N 1 ) . (For any subset A C V we define 
X A := {X^ | j e A}. Additionally, we require (N l ) ieV to be jointly independent and the graph obtained 
by drawing arrows from all elements of PA* to X % (for each i g V) to be acyclic. By restricting the 



Shimizu et al. 



function class one can identify the bivariate case: 
Y = a-X + N Y with N Y -1L X then p( x > Y ) only allows for X 



2006 



show that if P 



X,Y) 



allows for 



b-Y + Nx if (X,Ny) are jointly Gaussian 



( _1L stands for statistical i ndependence) . Th is idea has led to the extensions of nonlinear additive 
functions f(x, n) = g(x) + n Hoyer et al. 2009 , post-nonlinear additive functions f(x, n) = h(g{x) + n) 
|Zhang and Hyvarinen 2009 and discrete functions Peters et al. 2011a . Peters et al. |2011b| show that 



identifiability in the bivariate case is enough for multiple variables. 
ANM-based methods for more than two variables. Sections [4] and [5 



Mooij et al. [2009 provides practical 



apply these principles to time series. 



3 Causal inference on time series: existing methods 

For each i from a finite V, let (Xl) t( _ N be a time series. X t denotes the vector of time series values at 
time t. We call the infinite graph that contains each variable X\ as a node the full time graph. The 
summary time graph contains all ^fV components of the time series as vertices and an arrow between 
X 1 and X 5 , i ^ j, if there is an arrow from XI k to X{ in the full time graph for some k. This work 
addresses the following 

Problem: Given a sample (Xj, . . . , Xy) of a multivariate time series, recover the true causal sum- 
mary time graph. 

3.1 Granger causality- 
Granger causality [Granger] |1969| (G-causality for the remainder of the article) does not require compli- 
cated statistics, it is easy to implement, and it is based on the following idea: X % does not Granger cause 
A- 7 if including the past of X 1 does not help in predicting X{ given the past of all all other time series 
X k , k 7^ i. In principle, "all other" means all other information in the world. In practice, one is limited 
to X k , k € V. In order to translate the phrase "does not help" into the mathematical language we need 
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to assume a multivariate time series model. If the data follow the assumed model, e.g. 
below, G-causality is sometimes interpreted as testing whether X\_ h ,h > is independent of X[ 



x*_ h ,kev\{i}, 



h > [see 



Section 3.2 
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3.1.1 Linear Granger causality 

Linear G-causality considers a VAR model: X t = J2t=i -A-(T)X t _ T + , where X t and N t are vectors 
and A(t) are matrices. For checking whether X z G-causes X^ one fits a full VAR model Mf u n to X t and a 
VAR model M rostr to X t with the constraints A. j(r) =0 for all 1 < r < p that predicts X\ without using 
X? . Then one checks whether the reduction of the residual sum of squares (RSS) of X\ is significant by 
using the following test statistic: T := ^ fl55rc3 ^g^^ f (jvZ^ f " 1 "~ Pro5tr ^ ; where pf u n and p rcstr are the number 
of parameters in the respective models. For the significance test we use T ~ ^p ful i-p ros tr,^v-pfuir 



3.1.2 Nonlinear Granger causality 



G-causality has been extended to nonlinear relationships, [e.g. Chen et aT| 2004} Ancona et al. 2004 . In 



this paper we focus on an extension for the bivariate case propos ed by Bell et al~f 1996]. It"i"s based on 



generalized additive models (gams) 
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1996 



where N t is a #V dimensional noise vector. In order to test whether X G-causes X 
order 1, two models are fit: X\ = g^X}^) + N t and X} = g 2 {X}_ 1 ) + g 3 (X^ 1 ) + M t . 
utilize the same F statistic as above; this time pf u u and p Tes tr are the estimated degrees of freedom of the 
corresponding models. They refer to simulation studies by Hastie and Tibshirani |1990 



3.2 ANLTSM 



Following Bell et al. |1996 , Chu and Glymour 2008 introduce additive nonlinear time series models 



(ANLTSM for short) for performing relaxed conditional independence tests: If including one variable, 
e.g. Xt_i, into a model for X? that already includes Xf_ 2 , Xf_ ± , and X\_ 2 does not improve the pre- 



dictability of Xf, then X\_ x is said to be independent of X? given Xf_ 2 , Xf_ 1} X\_ 2 (if the maximal 



time lag is 2). Chu and Glymour 2008 



Spirtes et al. 2000 



t— 2> ^t-Xi ^t— 2 

propose a method based on constraint-based methods like FCI 
in order to infer the causal structure exploiting those conditional independence state- 
ments. The instantaneous effects are assumed to be linear and the confounders linear and instantaneous. 
Unfortunately, we did not find code for this method. 



3.3 TS-LiNGAM 



LiNGAM Shimizu et al. , 2006 infers causal graphs for linear, non-Gaussian data. It has been extended 
to time series by Hyvarinen et al. 2008| (for short: TS-LiNGAM). It allows for instantaneous effects, all 



relationships are assumed to be linear. Hidden confounders and nonlinearitics may lead to wrong results. 



3.4 Limitations of existing methods 

The approaches described above suffer from the following methodological problems: (1) Instantaneous 
effects: The formulation of G-causality has the intrinsic problem that it cannot deal with instantaneous 
effects. E.g., when X t is causing Y t , including any of the two time series helps for predicting the other. 
Thus G-causality infers X -> Y and Y -> X. ANLTSM and TS-LiNGAM only allow linear instantaneous 
effects. Theorem [l] shows that the causal summary time graph may still be identifiable when the instan- 
taneous effects are linear and the variables are jointly Gaussian. TS-LiNGAM does not work in these 
situations. (2) Confounders: G-causality might fail when there is a confounder between X t and Yt+i, 
for example: The path between X t and Y t +\ cannot be blocked by conditioning on any of the observed 
variables; G-causality infers X — >• Y . ANLTSM does not allow for nonlinear confounders or confounders 
with time structure and TS-LiNGAM may fail, too (Exp. 1). (3) Bad model assumptions: The methods 
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share a similar problem: Performing general conditional independence tests is desirable, but not feasible, 
partially because the conditioning sets are too large [e.g. Bergsma 2004 . Thus, the test is performed 
under a simple model, for example a linear one. 
wrong conclusions without noticing (e.g. Exp. 3). 



If the model assumption is violated, one may draw 
For TiMINo, that we define below, Lemma [l] shows 
that after fitting and checking the model by testing for independent residuals, the difficult conditional 
independences have been checked implicitly. 

Thus, a model check is a simple but effective improvement. Although G-causality for two time se- 
ries can easily be augmented with a cross-correlation test, we do not see a straight-forward extension 
to the multivariate G-causality. Furthermore, testing for cross-correlation does not always suffice (see 
Section 5.1 ). 



4 Functional models for time series: TiMINo 



We define TiMINo, a model class including the models described above and prove its identifiability. 

Definition 1 Consider a time series X t = {X l t )i^v , such that the finite dimensional distributions are 
absolutely continuous with respect to a product measure (i.e. there is a pdf or a pmf). We say the time 
series satisfies a TiMINo if there is a p > and ifVi e V there are sets PA„ C X v ^ i \PA l k C X v , s.t. 
Vt 



X* = /«((PA^) 



(1) 



with Nl (jointly) independent and for each i, Nl identically distributed in t. The corresponding full time 
graph is obtained by drawing arrows from any node that appears in the right-hand side of ([!]) to X\. We 
require the full time graph to be acyclic. 



Below we assume that equations ([I]) follow an identifiable functional model class (IFMOC), Peters 
et al. 2011b give a precise definition. Basically, it means that (I) causal minimality holds, a weak form of 
faithfulness that assumes a statistical dependence between cause and effect given all other parents [Spirtes 



et al. 2000 . And (II), all fi come from a function class (e.g. additive noise) that is small enough to make 



the bivariate case identifiable (Section 2.2 1 if we exclude certain function-input-noise combinations like 



linear-Gaussian-Gaussian. The proof of the following theoretical result can be found in the appendix. 

Theorem 1 Suppose that X t can be represented as a TiMINo with PA(Xl) = {J k=( )(PA k )t-k being the 
direct causes of X\ and that one of the following holds: 

(i) Equations (JlJ come from an IFMOC. 

(ii) Each component of the time series exhibits a time structure (i.e. PA(JQ) contains at least one 
Xl_ k ), the joint distribution is faithful with respect to the full time graph, and the summary time 
graph is acyclic. 

Then the full time graph can be recovered from the joint distribution. In particular, the true causal 
summary time graph is identifiable. (Note that neither of the two conditions implies the other.) 



Regarding (i): Many choices of a function class are possible Peters et al. 2011b . In practice, however, 
one still needs to fit those functions fi from the data, which means for additive noise that estimating 
ELY ( l |X t _ p , . . . , Xt_i] should be feasible. Different results show that stationarity and/or a mixing, or 
geometric ergodicity are required [e.g. Chu and Glymour 2008] . In this work we consider VAR fitting: 
fiip i, ■ ■ ■ ,Pr,n) = a i l -p 1 + . .. + a^ r -p r + n, gam regression: frfa, ...,p r ,n) = fi,i(pi) + ■ ■ ■ + fi, r (Pr) + n 
[e.g. Bell et al. 1996 , and GP regression: . . . ,p r , n) — fi(pi, ■ ■ ■ ,p r )+n. Note that linear functions 



lead to the model of Hyvarinen et al. 2008) as a special case. 
Regarding (ii): This condition nicely shows how the time structure does not only make the causal inference 
problem harder (the iid assumption is dropped), but also easier. In the iid case, for example, the true 
graph is not identifiable if all components are jointly Gaussian and the relationships are linear; with time 
structure it is. (TS-LiNGAM would fail, though.) 
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5 A practical method: TiMINo causality 



The algorithm for TiMINo causality is based on the theoretical finding in Theorem [T] It takes the 
time series data as input and outputs either a DAG that estimates the summary time graph or remains 
undecided. In principle, it tries to fit a TiMINo model to the data and outputs the corresponding graph. 
If no model with independent residuals is found, it outputs "I do not know" . For a time series with many 
components, this gets intractable. In Section [6j we concentrate on time series without feedback loops, 
where we can exploit a more efficient method: 



5.1 Full causal discovery 



For additive noise models (ANMs) without time structure, Mooij et al. 2009 propose a procedure that 



recovers the structure without enumerating all p ossible DAGs. Thi s procedure can be modified to be of 
use for time series (Algorithm 1). As reported by Mooij et al. 2009 , the time complexity is O(cP), where 



d is the number of time series, regarding fitting models and independence testing as atomic operations. 
To get the full time complexity, 0{d 2 ) has to be multiplied by the sum of the complexity of the regression 
method and the independence test, both chosen by the user. 



Algorithm 1 TiMINo causality 
1: Input: Samples from a d-dimensional time scries of length T: (X l7 . . . , Xy), maximal order p 
2: S:=(l,...,d) 
3: repeat 
4: for k in S do 

5: Fit TiMINo for X? using X t fe _ p , . . . , X^_ l , X l t _ pl . . . , X\_ X ,X\ for i e S \ {k} 
6: Test if residuals are indep. of X 1 , i G S. 
7: end for 

8: Choose k* to be the k with the weakest dependence. (If there is no k with independence, break 
and output: "I do not know - bad model fit"). 

9: S:=S\{k*} 
10: pa(fc*) := S 
11: until length(S)=l 

12: For all k remove all unnecessary parents. 
13: Output: (pa(l), . . . , pa(d)) 



Depending on the assumed model class, TiMINo causality has to be provided with a fitting method. 
Here, we chose ar, gam and gptk in R (http://www.r-project.org/) for linear models, generalized 
additive models, and GP regression, We call the methods TiMINo-linear, TiMINo-gam and TiMINo-GP, 
respectively. For the first two AIC determines the order of the process. All fitting methods are used in 
a "standard way" . For gam we used the built-in nonparamctric smoothing splines. For the GP we used 
zero mean, squared exponential covariance function and Gaussian Likelihood. The hyper-parameters are 
automatically chosen by marginal likelihood optimization. 

To test for independence between a residual time series 2V t and another time series X\ , i £ S, we shift 
the latter time series up to the maximal order ±p (but at least up to ±4) ; for each of those combinations 
we perform HSIC Gretton et al. 2008 , an independence test for iid data 

in iBrockwell and Davis 



One could also use a test 



based on cross-correlation that can be derived from Thm 11.2.3 
related to what is done in transfer function modeling [e.g. §13.1 in|Brockwell and Davis 



1991]. This is 
19911, which is 



restricted to two time series and linear functions. But testing for cross-correlation is often not enough: 
if no time structure is present (iid data), it is obvious that correlation tests are most often insufficient. 
Also, experiments 1 and 5 describe situations, in which cross-correlations fail. To reduce the running 
time, however, one can use cross-correlation to determine the graph structure and use HSIC as a final 
model check. For HSIC we used a Gaussian kernel; as in Gretton et al. 2008 , the bandwidth is chosen 



to be the median distance of the input data. This is a heuristic but well-established choice. 
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Note that any other fitting method and independence test can be used as well. Although they work 
well in practice, we do not claim that our choices are optimal. 



5.2 Partial causal discovery 

Let X t "almost" satisfy a TiMINo model, that is some time series are unobserved or some functional 
relationships are not included in the model. We expect that the full discovery method remains undecided. 
One can modify the method such that it tries to discover parts of the causal graph: Whenever no k with 
independent residuals is found in line 8 of Algorithm 1 one subtracts a subset 5*o from the current version 
of S (first subtract one element, then any combination of two etc.) and repeat. If the method is able 
to fit a TiMINo model using only the remaining set S\ So, output this solution and So , which has been 
excluded. Since there are 2# s subsets, this is only feasible for small S (see Exp. 6). This method may 
also be useful for the iid case; its theoretical properties remain to be investigated. 



5.3 Weaknesses 

(i) In principle, it may happen that the model assumption are violated, but one can nevertheless fit a 
model in the wrong direction (that is why we wrote "remaining mostly undecided"). This requires an 



"unnatural" fine tuning of the functions and Janzing and Steudel 2010 argue that in the case of causal 



sufficiency it cannot occur if one believes in the "independence" of cause and causal mechanism. Also, 
(i) is relevant only when there are time series without time structure or the data are non-faithful (see 
Theorem [I]). We do not provide a precise analysis of the case with confounders, but analyze this situation 
empirically in Experiment 1. (ii) The null hypothesis of the independence test represents independence, 
although the scientific discovery of a causal relationship should rather be the alternative hypothesis. This 
fact may lead to wrong causal conclusions (instead of "I do not know" ) on small data sets since we cannot 
reject independence for the wrong direction. This effect is strengthened by the Bonferroni correction of 
the HSIC based independence test. This may require modifications, when the number of time series is 
very high. It is thus useful to develop heuristics for "minimal" sample sizes, (iii) For large sample sizes, 
even smallest differences between the true data generating process and the model may lead to rejected 



independence tests [discussed by Peters et al. 2011a 



6 Experiments 

Code is available in the suppl. mat. and will be online. 
6.1 Artificial Data 

We always included instantaneous effects, fitted models up to order p = 2 or p = 6 and set a = 0.05. 

Experiment 1: Confounder with time lag. We simulate 100 data sets (length 1000) from 
Z t = a-Z t -i+Nz,t,X t = 0.6-X t -i + 0.5-Z t -i+Nx,t,Y t = 0.6-y t _i+0.5-Z t _ 2 + iVy it , with a between 
and 0.95 and N. t ~ 0.4-A/"(0, l) 3 . Here, Z is a hidden common cause for X and Y. For all a, X t contains 
information about Z t ~\ and Y t +\ (see Figure [l]); G-causality and TS-LiNGAM wrongly infer X — > Y. 
For large a, Y t contains additional information about X t +i, which leads to the wrong arrow Y — > X. 
TiMINo causality does not decide for any a. The nonlinear methods perform very similar (not shown). 
Note that for a = 0, a cross-correlation test is not enough to reject X Y. Further, all methods fail for 
a = and Gaussian noise. 

Experiment 2: Linear, Gaussian with instantaneous effects. We sample 100 data sets (length 
2000) from Xt = A 1 -X t _ 1 +N XiU W t =A 2 -Wt-i+A^X t + N Wf t,Y t =A i -Y t _ 1 + A h -Wt-i+N Y ,uZ t = 
Ae-Zt-i+Ar-Wt+As-Yt-i+Nzj and N. tt ~ 0.4-Af(0, 1) and A, iid from W([-0.8, -0.2] U [0.2, 0.8]). We 
regard the graph containing X — > W — > Y — > Z and W — > Z as correct. TS-LiNGAM and G-causality 
are not able to recover the true structure (see Table [lj. 

Experiment 3: Nonlinear, non-Gaussian without instantaneous effects. We simulate 100 
data sets (length 500) from X t = 0.8X t -i+0.3N x>t ,Y t = OAYt^ + (X t - X - l) 2 + 0.3N Y ,t, Z t = 0.4Z t _i + 
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Figure 1: Exp.l: Part of the causal full time graph with hidden common cause Z (top left). TiMINo 
causality does not decide (top right), whereas G-causality and TS-LiNGAM wrongly infer causal connec- 
tions between X and Y (bottom). 



Table 1: Exp. 2: Gaussian data and linear instantaneous effects: only TiMINo mostly discovers the correct 
DAG. 



DAG 


lin. Granger 


TiMINo- lin 


TS-LiNGAM 


correct 


13% 


83% 


19% 


wrong 


87% 


7% 


81% 


no dec. 


0% 


10% 


0% 



0.5 cos(Ft_i) +sin(T t _i) + Q.3Nz.t, with N.t ~ U([— 0.5, 0.5]) (similar results for other noise distributions, 
e.g. exponential). Thus, X — > Y — > Z is the ground truth. Nonlinear G-causality fails since the 
implementation is only pairwise and it thus always infers an effect from X to Z. Linear G-causality 
cannot remove the nonlinear effect from X t -2 to Z t by using Yj-i arL d gives many wrong answers. Also 
TiMINo-linear assumes a wrong model, but does not make any decision. TiMINo-gam and TiMINo-GP 
work well on this data set (Table [2]). This specific choice of parameters show that a significant difference 
in performance is possible. For other parameters (e.g. less impact of the nonlinearity) , G-causality and 
TS-LiNGAM still assume a wrong model but make fewer mistakes. 

Experiment 4: Non-additive interaction. We simulate 100 data sets with different lengths from 
X t = 0.2 • X t -! + 0.M x ,t, Y t = -0.5 + cxpHAVx + A t _ 2 ) 2 ) + O.lAV.t, with K t ~ Af(0, 1). Figure 2 
shows that TiMINo-linear and TiMINo-gam remain mainly undecided, whereas TiMINo-GP performs 
well. For small sample sizes, one observes two effects: GP regression does not obtain accurate estimates 
for the residuals, these estimates are not independent and thus TiMINo-GP remains more often undecided. 
Also, TiMINo-gam makes more correct answers than one would expect due to more type II errors. Linear 
G-causality and TS-LiNGAM give more than 90% incorrect answers, but non-linear G-causality is most 
often correct (not shown). Bad model assumptions do not always lead to incorrect causal conclusions. 

Experiment 5: Non-linear Dependence of Residuals. In Experiment 1, TiMINo equipped 
with a cross-correlation inferred a causal edge, although there were none. The opposite is also possible: 
X t = -0.5- Xt^+Nx^Yt = -0.5-Y t - 1 +Xf_ 1 +N Y ,t and N. <t - 0.4-jV(0, 1) (length 1000). TiMINo-gam 
with cross-correlation infers no causal link between X and Y, whereas TiMINo-gam with HSIC correctly 
identifies X — > Y. 

Experiment 6: Partial Causal Discovery. We sample 100 data sets (length 600) from X t — 
0.5 • X t -! + N x ,u B t = 0.5 • Bt-t + N B>t ,A t = 0.5 • A t -i + 0.5 • B t _ x + N A ,t, Y = 0.5 • Y t _ x - 0.9 • X t -i + 
0.8 ■ B t _i + N Y ,t, W t = 0.5 • W t _! + 0.8 • X t _i + N W j and N. ft - 0.4 • W([-0.5, 0.5]). Let X t be latent. 
The standard method finds A t as a "sink time series" and halts in iteration two (line 8 in Algorithm 1). 
Instead of outputting "I do not know" , the partial discovery method described in Section |5.2| is able 
to correctly infer this DAG (see Figure [3]) in 82% of the cases (18% wrong answers). G-causality and 
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Table 2: Exp. 3: Since the data are nonlinear, linear G-causality and TS-LiNGAM give wrong answers, 
TiMINo-lin does not decide. Nonlinear G-causality fails because it analyzes the causal structure between 
pairs of time series. 
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Figure 2: Exp. 4: TiMINo-GP (blue) works reliably for long time series. TiMINo-linear (red) and TiMINo- 
gam (black) mostly remain undecided. 

TS-LiNGAM give only wrong answers. 
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Figure 3: Exp. 6: The true causal summary time graph (left) cannot be recovered because X t is unob- 
served. TiMINo gives a partial result (right). 



6.2 Real Data 



We fitted up to order 6 and included instantaneous effects. For TiMINo, "correct" means that TiMINo- 
gam makes the correct decision and TiMINo-linear is correct or undecided. TiMINo-GP always remains 
undecided because there are too few data points to fit such a general model. Again, a is set to 0.05. 
Experiment 7: 



Box et al. 2008 



Gas Furnace. 

and Y t the output CO2. We regard X 
TiMINo-gam correctly infer X — > Y. Disregarding time information leads to a wrong causal conclusion: 



length 296], X t describes the input gas rate 
Y as being true. TS-LiNGAM, G-causality, TiMINo-lin and 



2009 leads to a p- value of 4.8% in the correct and 9.1% in the 



The method described by Hoyer et al. 
false direction. 

Experiment 8: Old Faithful. Azzalini and Bowman 1990 length 194] X t contains the duration 
of an eruption and Y t the time interval to the next eruption of the Old Faithful geyser. We regard X — » Y 
as the ground truth. Although the time intervals [t, t + 1] do not have the same length for all t, we model 
the data as two time series. TS-LiNGAM and TiMINo give correct answers, whereas linear G-causality 
infers X — > Y, and nonlinear G-causality infers Y — > X. 

Experiment 9: Temperature, (available at |https : //webdav. tue bingen.mpg. de/cause ~ef f ect/[ 
length 16382) X t are indoor and Y t outdoor measurements (recorded every 5 minutes), we expect that 
there is a causal link Y — > X. TS-LiNGAM wrongly infers X — >• Y and both G-causality methods infer 
a bidirected arrow. TiMINo remains undecided. Maybe, the data are causal insufficient: time may con- 
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found outdoor temperature and the usage of heating, the latter is a direct cause for indoor temperature 
Also, Y may cause heating. Such a model does not allow for a TiMINo from Y to X. 

Experiment 10: Abalone (no time structure). The abalone data set Asuncion and Newman 



2007] contains (among others that lead to similar results) age X t and diameter Y t of a certain shell fish. If 



we model 1000 randomly chosen samples as time series, G-causality (both linear and nonlinear) infers no 
causal relation as expected. TS-LiNGAM wrongly infers Y — > X, which is probably due to the nonlinear 
relationship. TiMINo gives the correct result. 

Experiment 11: Diary (confounder). We consider 10 years of weekly prices for butter X t and 
cheddar cheese Y t [Gould| 2007| length 522] . They are strongly correlated, but we expect this correlation 
to be due to the (hidden) milk price M t : X 4- M -*Y. TiMINo does not decide, whereas TS-LiNGAM 
and G-causality wrongly infer X — > Y . This may be due to different time lags of the confounder (cheese 
has longer storing and maturing times than butter). 

The phase slope index Nolte et al. 2008] performed well only in Exp. 6, in all other experiments it 
either gave wrong results or did not decide. Due to space constraints we omit details about this method. 



7 Conclusions and Future Work 

This paper shows how causal inference benefits from the framework of functional models. TiMINo 
causality can be seen as an extension of methods from the iid case, but the benefits compared to other time 
series methods are substantial and important: It comes with an identifiability that is more general than 
existing results and lead to a practical algorithm that allows for the ability to make no decision instead 
of a wrong one. TiMINo is applicable to multivariate, linear, nonlinear and instantaneous interactions 
and can also discover partial structures. On the data sets considered it outperforms existing methods. 

We think the following investigations would be worthwhile: (1) Applying more complex models (like 
heteroscedastic models) and preprocessing the data (removing trends, periodicities, etc.) may decrease 
the number of cases where TiMINo causality is undecided. (2) Checking for autocorrelations in the 
residuals is another possible model check and not included yet. (3) In the case of non-instantaneous 
feedback loops, one should find a method to fit the model structure that is faster than brute-force search. 
(4) Although we report promising results, an extensive evaluation of this method on even more real data 
sets is necessary. This lies beyond the scope of the present conference paper. 



8 Appendix 

Lemma 1 If ' X t = (JQ)jgy satisfy a TiMINo model, each variable X\ is conditionally independent of 
each of its non-descendants given its parents. 

Proof . With S := PA(X l t ) = [fk=o( PA k)t-k and equation we get Xl\ s = s = fi(s, N{) for an s with 
p(s) > 0. Any non-descendant of X\ can be written as a function of all noise variables from its ancestors 
and is therefore independent of X\ given S = s. For this proof it is crucial that we consider time series 
for t S N. We believe that a similar statement holds for t € Z, which only introduces technical difficulties. 
□ 



Proof of Theorem [T] Suppose that X t allows two different representations of TiMINo that lead to two 
different full time graphs Q and Q' . (i) First we assume that Q and Q' do not differ in the instantaneous 
effects: PAp(in Q) = PAg(in Q') Vi. Without loss of generality, there is some k > and an edge 
Xj_ k — > Xf, say, that is in Q but not in Q' . From Q' and Lemma UA we have that Xj_ u JL X? | S , where 



S = 1 < I <p,i eV}U NDf) \ {X]_ k , X?}, and ND f are all X\ that are non-descendants (wrt 

instantaneous effects) of A t 2 . Applied to G, causal minimality leads to a contradiction: X^_ k JjL X? | S . 
Now we suppose Q and Q' differ in the instantaneous effects. This time we choose S — {Xl_ t , 1 < I < 

p,i £ V}. Then for each s and i we have: Xl\s =s = /j(s,(PA ) ( ), where PA are all instantaneous 
parents of X\ conditioned on S — s. All Xl\s =s with the instantaneous effects describe two different 
structures of an IFMOC. This contradicts the identifiability results by Peters et al. 2011b 
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of Lemma [I] and faithfulness Q and Q' only differs in the instantaneous effects. But each instantaneous 
arrow X\ — > X\ forms a w-structure together with X° t _ k — > X{ ; the latter exists because of the time 
structure and Xj_ k cannot be connected with X\ since this introduces a cycle in the summary time 
graph. □ 
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